1 Task and executive summary

In this report, a 2D resource evaluation of a Co-Ni deposit is conducted. In this process, different kriging and gaussian cosimulation methods are compared with regards to their prediction error. Finally, one method is chosen and applied conduct a Grade-Tonnage estimation of the deposit. The assigned file contains one layer of data from the nickel laterite deposit Murrin Murrin in western Australia. The deposit consists of four lithologies :

  • FZ = (ferruginous zone),
  • SA = (saprolitic zone),
  • SM = (smectitic zone), and
  • UM = (“fresh” ultramafic rocks)

The metallogenetic model considers Ni (and to a lesser degree Co) is washed from the FZ and concentrated in the SA and SM zones, while Mg is utterly removed from the system. So, there are systematic changes in the composition of the ore depending on the zone.

This work consists on analyzing this data set with a general exploratory data analysis, a spatial exploratory analysis, variography, indicator and cokriging as well as cosimulation. Further, a crossvalidation is done to evaluate the resource distribution in the assigned slice of the deposit. This allows to obtain an overview of the effect of choosing different interpolation methods. In total six methods are compared. Three kriging methods and three gaussian simulation methods. The methods are as follows:

Kriging Simulation
VK1 Cokriging
Log-transformed values
Omnidirectional
VS1 Cosimulation
Log-transformed values
Omnidirectional
VK2 Mixed-model cokriging
Log-transformed values
Incl. lithology
Anisotropy
VS2 Mixed model cosimulation
Log-transformed values
Incl. lithology
Anisotropy
VK3 Cokriging
Logratio-compositions
No lithology
Omnidirectional
VS3 Cosimulation
Logratio-compositions
No lithology
Omnidirectional
IK Indicator kriging for lithology
Anisotropy

As such, the experimental plan follows somewhat the scheme of comparing

  1. the methods gaussian simulation to “classic” kriging methods and
  2. obtaining information about different levels of model complexity and their impact on the result

While this testing plan does not follow a Design of Experiments approach directly, it could somewhat be viewed as a two-dimensional testing plan with two levels for a and three for b.

The six main variants then are compared to each other and a final model is chosen and applied for a preliminary quantitative resource estimation. The selection of the final model is based on the mean absolute error (MAE), mean relative error (MRE), mean deviation (MD), and crossvalidation scatterplots.

For the final model tonnage-grade curves are computed to estimate the resource potential of the area. Since the final model uses simulation methods. A manifold of grade-tonnage curves is produced to estimate the variation and uncertainty of the resource estimation.

The dataset is a partial of a study published as: [1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9

Notice: The nature of this work is largely driven by graphical assessments and graphing. To maintain a certain brevity, some subsections of this work are structured in parallel. This means, that the user can activate the subsection she or he wishes to see. Code sections used to compile the calculations can be expanded by clicking on Code.

2 Preparations

For the assembly of this work, the following packages are used:

#install.packages("funModeling")
#install.packages(gridExtra) 
#install.packages(funModeling)
#install.packages("corrplot")
#install.packages("kableExtra")
#install.packages("vcd")
#install.packages("cvms") # for confusion metrics and matrices
#install.packages("plotly") # for interactive plots, is not loaded into namespace since it overwrites some dplyr functions
#install.packages("ggnewscale") 
#install.packages("rasterly")

library("rasterly")     # fast plotting of larger spatial datasets with plotly
library("ggnewscale")   # allows combination of cont. and factorial data in one ggmap
library("sp")           # contains spatial data classses
library("gstat")        # classical geostatistics 
library("RColorBrewer") # color palettes
library("magrittr")     # pipe %>% 
library("dplyr")        # data manipulation verbs
library("gmGeostats")   # gstat re-interfaced / for swath plots
library("ggplot2")      # nicer plots
library("data.table")   # for fast data frame manipulation
library("knitr")        # for nicer report formatting
library("kableExtra")   # for table formatting
library("corrplot")     # for corralation visualisation
library("funModeling")  # helper functions for exploratory data analysis
library("compositions") # compositional modelling
library("gridExtra")    # for arranging multiple ggplots in a grid similar to par(mfrow)
library("vcd")          # slightly improved optics for mosaic plot
library("readr")          # more comfortable read csv fun

The provided dataset consists of 12 columns and 735 observations. By standard, the columns X1-Ni are imported as numerical. “Hard”, “Lcode” and “subset” can be considered factorial variables and therefore are directly converted accordingly for further use.


#import

dt <- read_csv("Grafe.csv") %>% as.data.table()

#convert  colums to factor
fkt <- c("Hard", "Lcode", "subset" )
dt[,(fkt) := lapply(.SD, factor), .SDcols = fkt]

#print data table
dt %>% head() %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
X1 X Y Al Co Fe Filler Mg Ni Hard Lcode subset
23 1325 225 4.70 0.001 41.50 53.685 0.09 0.024 3 FZ training
32 1325 325 1.43 0.011 11.93 78.699 7.70 0.230 0 FZ training
52 1300 250 3.60 0.004 19.70 76.259 0.39 0.047 3 FZ training
76 1275 175 5.10 0.031 30.90 62.909 0.64 0.420 3 FZ training
100 1275 225 1.60 0.012 3.50 81.479 13.10 0.309 2 FZ training
123 1275 275 2.80 0.029 17.20 71.044 8.24 0.687 1 FZ training

3 Exploratory Data Analysis

3.1 Continuous Variables

Generally, four groups of data are present in the dataset:

  • X1, the ID-value of the sampling point
  • The spatial data: X and Y
  • The quantitative lab data for the elements Al, Co, Fe, Mg, Ni and a Filler-variable. The Filler variable is “introduced to achieve closure and to retain the intuitive relationship between each component and the mass of its associated element” [1].
  • The categorical variable Hard, Lcode and Subset. Hard relates to the rock hardness and LCode to the above mentioned Lithologies. Subset devides the dataset into a part that is used for the actual modeling and a validation part. The main part of this work is done with the “training” part of the dataset.

3.1.1 Summary Statistics

The the summary statistics for the numeric variables are shown in the following table.

options(knitr.kable.NA = '', digits=2, scipen=999)
 
 sumstats <- profiling_num(dt) %>% #convenient function to calculate most standard summary stats
   select(mean, std_dev, variation_coef, p_25, p_50, p_75, skewness, kurtosis) %>% #select wanted stats
  cbind (.,data.frame(min = sapply(dt[,.SD, .SDcols = is.numeric], function(x) min(x, #add min / max
         na.rm = T)), 
         max = sapply(dt[,.SD, .SDcols = is.numeric], function(x) max(x, 
         na.rm = T)) )) 
 
 names(sumstats) <- c("Mean", "Std. Dev.","Var. Coef.", "Q1", "Q2/Median", "Q3", "Skewness", "Kurtosis", "min", "max")
 #improve look and print
  sumstats %>%
   kable( table.attr = "style = \"color: black;\"")  %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed")) 
Mean Std. Dev. Var. Coef. Q1 Q2/Median Q3 Skewness Kurtosis min max
X1 9292.05 5313.11 0.57 4303.50 8693.00 14329.50 -0.31 1.4 23.00 14513.0
X 589.97 346.76 0.59 300.00 550.00 850.00 0.32 2.0 0.00 1325.0
Y 355.07 174.26 0.49 225.00 325.00 475.00 0.54 2.5 25.00 850.0
Al 4.84 3.57 0.74 1.50 4.20 7.60 0.55 2.4 0.10 17.5
Co 0.07 0.12 1.85 0.02 0.03 0.06 5.35 43.4 0.00 1.4
Fe 23.21 13.04 0.56 10.60 22.30 34.80 0.20 1.7 1.00 54.9
Filler 65.78 10.63 0.16 56.34 67.64 74.01 -0.10 2.1 40.33 93.8
Mg 5.40 6.31 1.17 0.24 1.41 11.14 0.85 2.3 0.06 24.1
Ni 0.70 0.51 0.72 0.33 0.58 0.97 1.24 4.7 0.01 3.0

It stands out, that Co exhibits both a very high skewness as well as kurtosis. This points towards a highly skewed distribution with possible outliers. Fe and Filler exhibit a relatively low skewness. Generally, the distributions of the variables will be investigated further. X1, the drillhole ID, will be dropped from now on.

Regarding the units of the variables in the dataset, it can be assumed that the values of the chemical elements are in weight-percent and the spatial coordinates can assumed to be meters. The exact coordinate system is not known, but it appears to be a local kartesian coordinate system. It is not known, whether X denotes northing or easting. For the sake of simplification, it is chosen to be the same as the traditional X-axis in plots.

Furthermore, the chemical variables and the filler sum up to 100 %. This allows the chemical components to be treated as a set of compositions.

3.1.2 Density Plots

While histograms are a common and familiar technique to evaluate distributions of data, they are harder to read when multiple groups are expressed in one plot. Because of this, density plots are chosen for visualization in this work. This allows for a grouping of the data according to their lithology.


#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
                        id.vars = c("Lcode"))
#draw boxplots 

a <- ggplot(data = dt.m, aes( x=value, fill=Lcode)) +
  geom_density( alpha = 0.3, aes(y = ..density..))+
  facet_grid(scales = "free", rows = vars(variable))
  
a

It shows, that for all variables except Fe and Filler, logarithmic scales seem to be preferential. This goes in line with the findings from the summary statistics. Hence, in the following, the logarithmized results are shown:


#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
                        id.vars = c("Lcode"))
#draw plots 

a <- ggplot(data = dt.m, aes( x=log(value), fill=Lcode)) +
  geom_density( alpha = 0.3, aes(y = ..density..))+
  facet_grid(scales = "free", shrink=F, rows = vars(variable))
  
a

Both from the general histograms as well as from the log-histograms, it can be seen that the distributions show differences in between the different lithologies. For almost all elements, differences in the empirical distribution density functions can be observed. both the spread of the distribution as well as the measures of centrality differ. The observed differences discussed further in the subsequent sections.

Also it can be seen, that a logarithmic scale seems to provide distributions that less are skewed and more close to a normal distribution which is a perquisite for the kriging analysis.

3.1.3 QQ-Plots

To confirm this, the QQ-plots are drawn on the normal data:

{par(mfcol=c(2,3))
qqnorm((dt$Al), col=2);qqline((dt$Al));legend(legend = "Al","topleft")
qqnorm((dt$Co), col=2);qqline((dt$Co));legend(legend = "Co","topleft")
qqnorm((dt$Fe), col=2);qqline((dt$Fe));legend(legend = "Fe","topleft")
qqnorm((dt$Filler), col=2);qqline((dt$Filler));legend(legend = "Filler","topleft")
qqnorm((dt$Mg), col=2);qqline((dt$Mg));legend(legend = "Mg","topleft")
qqnorm((dt$Ni), col=2);qqline((dt$Ni));legend(legend = "Ni","topleft")}

Large deviations can be seen for most distributions. Especially the tail behavior differs from the normal distributions. So the data again are log-transformed:

dt.orig <- dt #backup for debugging
fkt <- c("Al", "Co","Fe", "Filler", "Mg", "Ni") #colums to transform
fkt_new <- c("Al_log", "Co_log","Fe_log", "Filler_log", "Mg_log", "Ni_log") #names of the transformed colums
dt <- dt[,(fkt_new) := lapply(.SD, log), .SDcols = fkt] #actual transformation with data.table

Now the QQ-plots for the log-transformed data are as follows:

{par(mfcol=c(2,3))
qqnorm((dt$Al_log), col=2);qqline((dt$Al_log));legend(legend = "Al_log","topleft")
qqnorm((dt$Co_log), col=2);qqline((dt$Co_log));legend(legend = "Co_log","topleft")
qqnorm((dt$Fe_log), col=2);qqline((dt$Fe_log));legend(legend = "Fe_log","topleft")
qqnorm((dt$Filler_log), col=2);qqline((dt$Filler_log));legend(legend = "Filler_log","topleft")
qqnorm((dt$Mg_log), col=2);qqline((dt$Mg_log));legend(legend = "Mg_log","topleft")
qqnorm((dt$Ni_log), col=2);qqline((dt$Ni_log));legend(legend = "Ni_log","topleft")}

It can be seen that for:

  • Al the logarithmization changes the tail behavior from a tail an the lower side to the higher side
  • For Co, a significant improvement towards a normal distribution can be observed
  • For Fe, the symmetrical tail behavior is changed to a one sided tail behavior
  • For Filler, little difference can be observed
  • For Mg an improvement is observed, although it still shows a significant two-sided tail
  • For Ni, a small improvement is observed in the central part of the observation

As a result, all values, even Fe and Filler will be used as a logarithmized version. By this, all variables can be compared witch each other.

3.2 Categorical Variables

The summary for the categorical variables is shown below. Of relevance to the understanding of this dataset are the two variables Hard and Lcode. Additionally, a variable subset exists. It stores the information which part of the data set is used for training and which is used for validation of a modeling process. It is almost distributed 50-50 across the dataset. The table below shows the number of members in the respective groups.


dt[,.SD, .SDcols = is.factor] %>% summary %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Hard Lcode subset
0: 53 FZ:334 training :371
1:394 SA: 65 validation:364
2:166 SM:278
3:122 UM: 58

The interrelationship of hardness and lithology is plotted with a mosaic plot. Hereby all possible combinations of levels for Lcode and Hard are plotted against each other. The size of the cells in each of the two axis’ corresponds to the number of members for that level. Additionally, the pearson residuals are plotted as blue and red shades. A positive pearson residual corresponds to more members being in this level combination than the null hypothesis would suggest. The null hypothesis is that no particular association exists between the groups. More information can be found in the supplement to the lecture [3], or in [4]. Additional to the mosaic plot the numeric values of the group combinations are shown in the contingency table.

Mosaic plot



#draw mosaic plot
vcd::mosaic(~Hard+Lcode, data = dt, shade=TRUE, legend=TRUE,direction="v",)

Contingency Table

#calculate contingency table
ct <- dt[,.SD, .SDcols = c("Hard", "Lcode")] %>% 
  table() %>% as.data.frame.array() 

#add colsums
csum <- ct %>% apply(.,2,sum)
ct["Sum",] <- csum
#add rowsums
rsum <- ct %>% apply(.,1,sum)
ct$Sum <- c(rsum)

#print
ct %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
FZ SA SM UM Sum
0 19 10 21 3 53
1 169 43 181 1 394
2 81 6 63 16 166
3 65 6 13 38 122
Sum 334 65 278 58 735

It shows that rock hardness 3 is positively associated to UM (fresh ultramafic zone) and negatively to SM (smectitic zone). Other than that, lower associations seem to exist beween SA (saprolitic zone) and Rock 0 as well as between SM and Rock 1. Especially the association of hardness level three is logical because fresh unweathered rock is generally harder/more robust than its weathered counterparts. In this dataset, the hardness level could also identify or incorporate indicators for the Rock Mass Quality as details of the nature of the variable Hard are not known.

3.3 Association of Elements to Rock Classes

To identify the association of the elements to the lithologies, boxplots after Tukey are produced:

#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")],
                        id.vars = c("Lcode"))

ggplot(data = dt.m, aes( y=log(value),group=Lcode, fill=Lcode)) +
  geom_boxplot( alpha = 0.3, notch = T)+
  facet_wrap(scales = "free_y", shrink=F, ~variable, ncol=3)+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

It clearly can be seen that differences of the element components between the rock masses appear. For Al and Fe, the highest values are found in Zone FZ, followed by SA, SM and UM in decreasing order. Mg and Filler however behave oppositely, with the lowest values found in FZ and SA, SM and UM showing values in increasing order. Co and Ni show a different behavior. Here the highest values are found in SA, followed by SM. The association of SA to Ni and Co is already known from the geological description of the site and can be confirmed here. FZ and UM show similar low values of the two elements. The notches give a graphical indication whether the differences in the medians are significant. When the notches do not overlap, there is strong evidence, that the medians differ. According to that, the differences between:

  • SM and UM for Al,
  • SA and SM for Co,
  • SM and UM for Filler,
  • SM and UM for Mg and
  • SA and SM for Ni

do not differ significantly.

The results however show, that generally a relative strong influence of the lithology towards the mineralization can be assumed.

3.3.1 Association of Elements to Rock Hardness

To identify whether similar findings can also be seen for the rock hardness, the procedure is repeated for the rock hardness.

Elements vs Hardness

#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Hard")], 
                        id.vars = c("Hard"))

ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
  geom_boxplot( alpha = 0.3, notch = T)+
  facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)

Detail: Elements vs. Hardness for “SA”


dt.m <- melt.data.table(dt[Lcode=="SA",.SD, .SDcols = c(colnames.num, "Hard")], # Example for detailed view for Lcode="SA"
                        id.vars = c("Hard"))

ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
  geom_boxplot( alpha = 0.3, notch = F)+
  facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)+
  labs(title = "Composition parts vs. Hardness for Lcode SA")

While some differences can be seen, the general picture is less distinct than for the rock type. UM is tendentiously the hardest zone and also contains tendentiously more Mg. This does not show in the hardness.

For Co, the picture on first sight is very similar to the results from the boxplots grouped by Lcode. Zone SA (which is relatively higher associated to hardness 0), showed the highest Co content. Here however, hardness 1 shows the highest Co contents. So a simple relationship can not be assumed. Also crossinterellations between Lithology and hardness could be assumed.

For example within Zone SA, a even stronger accumulation of Co could be associated to a certain rock hardness. This is exemplarily shown in the second boxplot. There the notches are removed because the results are partially so skewed that the notches would extend over the Q1 resp. Q3. Generally a similar picture for the elements Co, Mg and Ni can be observed: That a stronger accumulation of these elements occurs in hardness 1. The composition of a rock mass can influence the hardness even in a small scale. Some indicators for this during mechanical excavation have been found by Rimos et. al [3] during a master thesis at TU BAF. There, the cutting resistance for the cutting with conical picks of pyritic Pb-Zn-Ore samples from Reiche Zeche was tested. It showed a higher cutting resistance in areas in which increased Ca-contents where found. The element contents where approximated in a dense pattern by means of a handheld XRFA device.

However, due to the necessary complexity in modelbuilding, this influence is dropped as it would overextend the framework of this work.

# this is completed if necessary

# options(scipen = 5)
# i <- 1
# for (i in 1:length(levels(dt$Lcode))) {}
#   contrasts(dt$Lcode) <- contr.treatment(levels(dt$Lcode), base=i)
# u <- levels(dt$Lcode)[[i]]
# lm1 <- lm(dt,formula = Mg~Lcode) 
# lm1 <- lm1%>%summary
# 
# names(lm1[["coefficients"]][,1])
# lm1[["coefficients"]][,1]
# lm1[["coefficients"]][,4]

3.4 Correlation and Covariance

The correlation and covariance statistics for the numeric values are presented as follows. The correlation is calculated twofold: as spearman correlation and as pearson correlation. The pearson correlation coefficient the goodness of a fit of a linear regression between these two variables. A pearson coefficient of 1 equals a R² of 1 for a linear regression. A spearman correlation coefficient follows the same basic principles but takes into consideration the ranks of the data pairs instead of their actual values. As such, a spearman correlation of 1 means that the relation between the covariates follows a perfect monotonous ascending function (-1 would indicate a descending function). As such, a high spearman and a low pearson coefficient could indicate e.g. a quadratic relation. A more detailed description of the two methods can be found e.g. in [2].

Values in the upper triangle show the respective spearman-correlation coefficients, values in the lower part show the pearson correlation coefficients. Correlations that do not satisfy a threshold of sigma < 0.05 are marked with a black cross. These correlations can be considered insignificant.

All calculations are done with the logarithmized values as these will be taken for the further interpolation - except the spatial variables X and Y. For comparison, also the covariance matrix is computed.

Correlation

use <- cbind(dt[,2:3], dt[,4:9] %>% log())

dt.ptest.sp <- cor.mtest(use,method="spearman")
dt.cor.sp <- cor(use,method = "spearman")
dt.ptest.p <- cor.mtest(use,method="pearson")
dt.cor.p <- cor(use,method = "pearson")


corrplot(dt.cor.sp,p.mat = dt.ptest.sp$p,
         method = "color", outline=T,
         type="upper",
         addgrid.col = "grey", tl.col = "black",
          tl.pos= "td",
         addCoef.col="black",
         sig.level = 0.05, 
         order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
         mar =c(1,0,2,1)) 

  corrplot(dt.cor.p,p.mat = dt.ptest.p$p,
         method = "color", outline=T,
         type="lower",
         addgrid.col = "grey", tl.col = "black",
          tl.pos= "lt",
         addCoef.col="black",
         cl.pos="n",
         sig.level = 0.05, 
         order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
         mar =c(1,0,1,1), add=T)
  
mtext(side = 3,line = 2, "Top: Spearman")
mtext(side = 2, "Bottom: Pearson")

Generally, it can be seen, that the correlations after pearson and spearman show similar results. Spearman correlation in some cases gives higher values than pearson. However, also the opposite can be observed here. When the pearson correlation is higher than spearman, it speaks for deviations from linearity that are so little, that they reduce the spearman correlation more than pearson (since spearman operates rank-based). When the spearman correlation is higher, a monotonous but less linear function is present.

The variable X and Y show very low correlations to the geochemical variables, some of them are also insignificant. This speaks for the absence of a general spatial trend. This will be investigated later in more detail.

Some correlation pairs show a higher correlation after spearman. This applies mainly to Mg/Ni and Al/Ni, Filler/Fe. Others like Mg/Filler and Fe/Al show a higher correlation for Pearson.

Generally within the group of chemical variables, the pairs show the following noteworthy correlations:

  • Al/Fe and Ni/Co, Mg/Filler show relatively strong positive correlation
  • Al/Filler, Al/Mg and Al/Ni, Fe/Filler, Fe/Mg show a negative correlation

Noteworthy is that Al correlates positively only with Fe and negatively with filler speaks for the fact that it occurs as a companion with the iron and not as a own silicate mineralization. This goes in line with the geological description after [1] witch states a lateritic zone. Co and Ni show a positive relation, at the same time Ni shows a negative relation to Al. This speaks for the fact that these two form their own mineralization complex.

Covariance

Following, the covariance matrix only for the chemical elements (log-transformed) is shown.


use <- dt[,4:9] %>% log()

dt.cov <- cov(use,method = "pearson")



corrplot(dt.cov,
         method = "color", outline=T,
         type="upper",
         addgrid.col = "grey", tl.col = "black",
          tl.pos= "td",
         addCoef.col="black",
         sig.level = 0.05, 
         order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
         mar =c(1,0,2,1),is.corr = F,
         number.digits=2, number.cex = 0.8,
         cl.align.text = "l") 

The table shows similar results to the correlation analysis. However, it also shows the magnitude of variation, since it is a non-standardized version of the correlation. As such, Fe and Filler generally vary not so much, although they show the highest values. The highest absolute covariation values show for the pairs Al-Mg and Fe-Mg. Since the values are negative, the correlation is also negative. Also noteworthy is that the Pair Fe/Filler - which shows a strong correlation - but shows a weak covariance. This can be attributed to the low variance of Filler.

3.5 Scatter Plot Matrix and Swath Plots

Swath plots are used to estimate whether a trend in the spatial directions might exist. There, the element contents are plotted against the X- resp Y-axis and overlaid with a moving average estimation. In this case, the estimator uses a Loess-estimator. The swath plots are shown in the following for the elements irrespective of their rock class.

A Scatter plot matrix can shed more light on the cross relations between the covariates. It is a tool that can give a broad understanding of the situation and allows a more detailed picture than a sheer correlation analysis. Here, the data are grouped according to the rock type to identify whether the correlation behavior is different in between the rock types. In the upper part, linear regression lines are drawn. On the diagonal, the density plots already shown previously are drawn. In the lower section raw scatterplots are drawn.

Swath Plot X-direction

X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$X)

For Mg_log a outlying area varying area can be seen for low X-values. Also slight trends could be observed for Fe, Filler and Ni to an extend. It also can be seen that some elements, in particular, Mg, seem to appear in at least two clusters. One that shows higher values and one that shows lower values.

Swath Plot Y-direction


X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$Y,xlab = "Y",  )

In Y-direction stronger local trends appear at low Y-Values. Also here, the appearance of clusters can be suspected.

Scatter plot matrix

library(GGally)


bigplot <- ggpairs(data = dt,columns = c(2,3,13:18),
                   mapping = ggplot2::aes(colour=dt$Lcode), 
                   lower = list(continuous = wrap("points", alpha = 0.8, size=0.3)), 
                   diag = list(discrete="barDiag", 
                               continuous = wrap("densityDiag", alpha=0.5 )), 
                   upper = list(continuous = wrap("smooth_loess", alpha = 0.1, size=0.05)
                   ) 
) +
  theme(panel.grid.major = element_blank())
(bigplot)

The scatter plot matrix shows a more detailed picture of the situation. Also, since the variables are grouped by Lcode, a more detailed picture of its influence can be drawn.

The following points can be summarized from the scatter plot matrix:

  • The clear correlation between Fe and Filler is repeated

  • The scatter plots indicate clustered structures according to the rock type. There the behavior can be relatively different for the different rock types. Comparing the regression lines to the correlation coefficients and the scatter plots, it shows that for most pairs, the regression coefficients are relatively weak and the data appear more in clusters then in trends.

  • The correlation between the spatial variables X and Y and the chemical variables shows a curious effect. While the swath plots showed certain trends for some variables. The grouped scatterplots shed more light on these effects. There, the dominating effect to define the element contents seems to be the Lcode. Because the Lcode is distributed heterogeneously over the study area, an apparent trend can be seen. As such, instead of incorporating a static trend, a correlation of the rock type to the element contents should will be considered. Within the LCodes, only partial local trends can be seen.

These findings somewhat draw a unclear picture. While the swath plots indicate a certain trending behavior in Y direction, a more diverse a noisy behavior is shown with the grouped swath plots. The presence a trend of largely defines the extrapolation behavior. Since extrapolation is not an aim of this study and the sampling density is rather high, no trend will be used for the subsequent analysis. The correlation of Lcode to the element contents is incorporated in the analysis variants VK2 and VS2.

In order to shed more light on possible cluster behavior, a k-means cluster analysis could provide more insight into the situation. It is not included in this report to maintain brevity.

4 Spatial Descriptive Analysis

Following, the input variables with respect to their spatial distribution are described. Here, the categorical and continuous variables are first shown separately and then combined in the final subsection. Furthermore, the framing parameters for the semivariographic analysis that is conducted prior to the actual geospatial modeling are estimated.

4.1 Categorical Variables

The following maps plot the variables Hard and Lcode as dotplots. The X- and Y-axis represent the respective variable in the input data. Only the training data are plotted.


dt.m <- melt.data.table(dt[subset=="training",.SD, .SDcols = c("Lcode", "Hard",  "X", "Y")][,-c("X1" )],
                        id.vars = c("X", "Y"))



temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
  geom_point(shape=19)+
  coord_fixed()+
  ggtitle(label = i) #+
  #scale_color_gradient(low = "blue2",high = "red2")

}

do.call("grid.arrange", c(temp, ncol=2))


# par(mfrow=c(1,2))
# 
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Hard), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Hard), fill=2:6)
# 
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Lcode), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Lcode), fill=2:6)

It can be seen, that Hard is relatively evenly distributed. However, Hard 2 and 3 seem to accumulate towards the east of the field. For Lcode, an accumulation of FZ can be seen in a central E-W-band. SM is accumulated more on the west respective N and S borders of the project. Of importance are also the variables UM and and SA. They are relatively sparsely distributed. UM can be found in thin bands along the ESE-WNW axis. SA can be found in single patches in areas where FZ and SM seem to show contact zones.

4.2 Continuous Variables

To obtain an understanding of the distribution of the different elements. The values are plotted as follows:



dt.m <- melt.data.table(dt[,.SD, .SDcols = c(fkt_new,  "X", "Y")][,-c("X1" )],
                        id.vars = c("X", "Y"))



dt.m
##          X   Y variable value
##    1: 1325 225   Al_log  1.55
##    2: 1325 325   Al_log  0.36
##    3: 1300 250   Al_log  1.28
##    4: 1275 175   Al_log  1.63
##    5: 1275 225   Al_log  0.47
##   ---                        
## 4406:   50 675   Ni_log -2.73
## 4407:   50 825   Ni_log -0.93
## 4408:   25 150   Ni_log -0.39
## 4409:   25 700   Ni_log -1.71
## 4410:    0 125   Ni_log -0.55
temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
  geom_point()+
  coord_fixed()+
  ggtitle(label = i)+
  scale_color_gradientn(colors = c("blue2","purple", "red2"))
  
}

do.call("grid.arrange", c(temp, ncol=3))

It can be seen from the maps that areas that show high concentrations of AL tend to show low concentrations of Co, Mg and Ni to some extend. Also the Filler variable exhibits this behavior. Generally, it seems that two different groups of mineralization exist. One group with high Al and Fe concentrations and one zone with high Mg, Ni, Co, Filler concentrations. This also goes in line with earlier discoveries, where the Lcode corresponds with certain element levels.

4.3 Bubble plot

For a better visualization of the element distribution in relation to the Lcode, a bubble plot is good tool. Here, the assay locations are drawn as bubbles. Their color indicates the rock type and their size indicates the element content. Exemplary, the plots for Mg and Ni are shown.


a <- ggplot(data=dt, aes(x = X, y=Y, size=Mg_log, color=Lcode))+
  geom_point(linecolor="black", alpha=0.7)+
  coord_fixed()+
  ggtitle(label = "Mg_log")

b <- ggplot(data=dt, aes(x = X, y=Y, size=Ni_log, color=Lcode))+
  geom_point(linecolor="black", alpha=0.7)+
    coord_fixed()+
  ggtitle(label = "Ni_log")

grid.arrange(a, b, nrow=1)

For Mg it can be seen that the ferruginous zone shows generally shows lower iron contents, where higher values can be observed in the other three zones. For Ni, the picture is less clear, however, also here lesser contents seem to appear in zone FZ.

4.4 Bin Width and Maximum Distance

For the subsequent geostatistical modeling procedures, the bin width and max variogram distance is estimated as follows. For this estimation, only the assay points for the “training” part of the data set are used.

The minimum distance between sample sites is ca. 35 m. This value will be used as bin size. The maximum can be estimated with the median distance between assay sites or graphical with the radius of the biggest circle that fits in the boundaries of the survey area. The median distance is 431.57 m. The radius of the biggest circle fitting in the study area is estimated with ca. 230 m. As a tradeoff between the two values, a cutoff of 300 m is chosen since the area has a relatively elongated shape.

For reference, the subsequent plot shows the training and validation data locations:



a <- ggplot(dt, aes(x = X, y = Y, color = subset)) +
  geom_point()+
  coord_fixed()+
  ggtitle(label = "Interactive Plot Training/Validation Data Set")

plotly::ggplotly(a)

5 Indicator Kriging IK

For the interpolation of the categorical variable, ordinary indicator kriging is used. No Cokriging is used in this case. The variables are modeled individually. A version of indicator cokriging is included in the mixed-model Cokriging (VK2). Here, the sequential fitting of anisotropic variograms was used. As such, no interaction between the different rock zones was assumed. This allows a more exact fitting of single semivariogram models at the cost of neglecting a possible relationship between them. Also it somewhat allows a comparison between the interpretation results of the Mixed Model and the sequential Indicator Kriging. Shown below is one of the fitted variograms.

#prepare grid for all further use
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)

# create the gstat containers
gs_i_FZ <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)

gs_i_SA <- gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300) 

gs_i_SM <- gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300) 
  
gs_i_UM <- gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300) 

Lcodes <- c("FZ", "SA", "SM", "UM")
#initiate the variograms as loop for each LCode
for (i in c("FZ", "SA", "SM", "UM")){
  assign(paste0("vg_i_", i), 
         variogram(get(paste0("gs_i_", i)),
                   width=35, cutoff=300, alpha=c(0:5)*30 ))
}

#fit FZ
#plot(vg_i_FZ)
vgt_FZ  <- vgm(psill=0.1, model="Wav", range=120, nugget=0,anis = c(40,0.4), 
               add.to = vgm(psill=0.15, range=350,  nugget=0.1, model="Gau"))
  
 # plot(vg_i_FZ, model=vgt_FZ)

#  fit SA
#plot(vg_i_SA)
vgt_SA  <- vgm(psill=0.05, model="Wav", range=50, nugget=0.03,anis = c(0,0.4))
  
  #plot(vg_i_SA, model=vgt_SA)
  
#  fit SM
#  plot(vg_i_SM)
vgt_SM  <- vgm(psill=0.05, model="Wav", range=80, nugget=0.03,anis = c(0,0.4),
               add.to = vgm(psill=0.15, range=350,  nugget=0.1, model="Exp"))
  
  #plot(vg_i_SM, model=vgt_SM)

  #fit UM
#  plot(vg_i_UM)
vgt_UM  <- vgm(psill=0.05, model="Exp", range=150, nugget=0.02,anis = c(120,0.3),
               add.to = vgm(psill=0.01, range=50,  nugget=0, model="Wav",anis = c(30,0.3)))
  
  plot(vg_i_UM, model=vgt_UM)

The figure shows the fitted variogram model for UM. To fit it, an overlay of an exponential and a wave function were used at varying azimut and anisotropy ratios. The Exponential part has an azimut of 120° and the wave part an anisotropy of 30°. The figure also shows that a highly accurate fit could not be achieved. However the main characteristics of the variography structure could be captured. In summary the following variogram models were used for the fitting:

  • FZ: Nested Wave (anisotropic) and Gaussian (omnidirectional)
  • SA: Wave (anisotropic)
  • SM: Nested Wave (anisotropic) and Exponential (omnidirectional)
  • UM: Nested Wave (anisotropic) and Exponential (anisotropic)

With the fitted variograms, the actual indicator kriging is conducted:

  #update gstats
  
  for (i in Lcodes){
  assign(paste0("gs_i_", i),
         gstat(id=i,
               formula = (Lcode==paste0(i))~1,
               locations = ~X+Y,
               data=dt[subset=="training"],
               model = get(paste0("vgt_",i)),
               nmax=30)
  )
}

  #kriging
  
  for (i in Lcodes){
  assign(paste0("krig_", i),
         predict(get(paste0("gs_i_", i)),
                     newdata=xy_grid)
  )
}
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
krig_i_s <- cbind(krig_FZ, krig_SA[,3:4], krig_SM[,3:4], krig_UM[3:4])

krig_i_s$Lcode <- krig_i_s %>% select(contains(".pred")) %>% max.col %>% 
factor(x = .,levels = c(1,2,3,4),labels = Lcodes) 

krig_i_s %<>% as.data.table()

# applying variance cutoff of Q0.90
var_cutoff <- krig_i_s$FZ.var %>% quantile(0.90, na.rm = T)
krig_i_s <- krig_i_s[FZ.var < var_cutoff]


#Plotting the results

a <- ggplot(data = krig_i_s, aes(x=X, y=Y, fill=krig_i_s$Lcode))+
  geom_raster()+
  coord_fixed()+
  geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y,  fill=Lcode), size=3, shape=21, alpha=0.7)+
  labs(fill="Lcode", title = "Lcode")

a

The image shows the interpolated results as well as the original sampling points. It can be seen that for the SA areas, small “islands” appear. For UM, which is also relatively scarce, smaller patchy areas are modeled. At some sampling points, the modeled patches are so small that they visually do not extend most overlain sampling points.

The quality of the interpolation is quantified with a confusion matrix as follows. The confusion matrix shows the number of counts in the respective fields, the sum of all predictions in the classes as well as the sum of the true classes (“Target”). The percentage values show the sensitivity for the respective class. Sensitivity is defined as the ratio of the number of true positive predictions divided by the sum of true positives and false negatives. Additionally the overall accuracy is calculated (not shown in the plot). It calculates as the ratio of correct predictions divided by the sum of cases for this model [5].


  #kriging for validation data
  
  for (i in Lcodes){
  assign(paste0("krig_val_", i),
         predict(get(paste0("gs_i_", i)),
                     newdata=dt[subset=="validation",])
  )
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]

krig_val_i_s <- cbind(krig_val_FZ, krig_val_SA[,3:4], krig_val_SM[,3:4], krig_val_UM[3:4])

krig_val_i_s$Lcode <- krig_val_i_s %>% select(contains(".pred")) %>% max.col %>% 
factor(x = .,levels = c(1,2,3,4),labels = Lcodes) 

krig_val_i_s %<>% as.data.table()


CM_i <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = krig_val_i_s$Lcode )

cvms::plot_confusion_matrix(CM_i$`Confusion Matrix`[[1]],
                      add_sums = TRUE,
                      add_normalized = F,
                      diag_percentages_only = T,add_row_percentages = F, sums_settings = )

It can be seen that the accuracy is relatively mediocre. The overall accuracy is 58.8%.

In the context of accuracy, it must also mentioned, that the variable UM and SA appear relatively sparsely and their structures are small/thin. As such a certain error can be expected when comparing to the validation set.

6 Cokriging VK1

In this chapter, omnidirectional cokriging with the log-transformed values is conducted. Since this model is supposed to serve as a “simplistic” model. Neither anisotropy nor directional trends will be incorporated.

6.1 Variography

As a first approach to the modeling of the dataset, the omnnidirectional variograms are fitted as follows. Since the variogram matrix is rather big, use the show/hide button to show the plot.


#variables to use
vars_l = c("Al_log", "Co_log", "Fe_log", "Filler_log", "Mg_log", "Ni_log")

#loop for allocating values to gstat object - only training data
#--> slight change to script, instead of prepopolating gs_l, it is set as empty var, by this the specifics of the variable reading have to be typed only once

{gs_l=NULL
for(j in vars_l){
  frm = as.formula(paste(j,1, sep="~"))
  print(frm)
  gs_l = gstat(gs_l, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
}
}
## Al_log ~ 1
## Co_log ~ 1
## Fe_log ~ 1
## Filler_log ~ 1
## Mg_log ~ 1
## Ni_log ~ 1

vge_l = gstat::variogram(gs_l,width=35, cutoff= 300) # create variograms

#create initial varmodels
vgt_l = vgm(psill=0.5, range=120, nugget=1,  model="Sph", add.to=vgm(psill=0.8, range=300,   model="Gau"))

#fitting
gs_l = gstat::fit.lmc(v=vge_l, g = gs_l, model = vgt_l, correct.diagonal = 1.00001)

#plot varplots + fits
plot(vge_l, model = gs_l$model)

Most variogram fits are considered sufficiently fitting. Some element combinations show problematic fitting. However their weight to the final result is considered to be very small. As such, the fits are considered sufficient and the actual cokriging commences.

Predictions that have a variance of greater than the .95-quantile of the total variance of Fe are filtered out for display and further purposes.

#making grid with a margin of 100 m around extends of survey area
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)

#prediction
cokriged = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  4% done
 21% done
 30% done
 38% done
 46% done
 57% done
 71% done
100% done
#cokriged %>% select(contains(".var")) %>% summary # for debugging

#sets the variance cutoff above wich values are omitted
var_cutoff <- cokriged$Fe_log.var %>% quantile(0.95, na.rm = T)

#omitting the respective values
cokriged <- cokriged %>% as.data.table()
cokriged <- cokriged[Fe_log.var < var_cutoff] 

cokriged_val <- predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
 18% done
100% done


#storing backtransformed kriging values
cokriged2 <- cokriged %>% select(contains(".pred")) %>% exp %>% cbind(cokriged[,c("X", "Y")], .)

6.2 Results Cokriging

Shown below are the interpolation results for the omnidirectional cokriging for all elements and Filler. The values are backtransformed into normal space.

Al


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Al_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Al [%]")

a

Generally, few higher Al containing zones exist with contents of up to 12.5 %.

Co


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Co_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Co [%]")

a

Co is relatively scarce with only three smaller areas where is is concentrated above 0.1 %. Especially one area in the south-west stands out.

Fe


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Fe_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Fe [%]")

a

Iron is relatively abundant in the deposit. With large areas containing above 35% of iron in the east, west as well as small strips in the north.

Filler


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Filler_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Filler [%]")

a

While Filler is not a real chemical variable, it is shown for reference purposes. It shows a opposite behavior than the iron. This is logical since iron is the most abundant element and the covariates form a composite.

Mg


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Mg_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Mg [%]")

a

Mg in large areas is relatively scarce with the main area below 10%. However, small strips in the central north as well in the south show concentrations of up to 30%.

Ni


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Ni_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]")

a

Ni shows several areas where higher contents are measured. Especially in the West with up to 1.5% Ni. Also higher concentrations can be found in the center and southeast.

7 Mixed Model Cokriging VK2

In this chapter, the mixed model cokriging is shown. Here, all chemical variables including the filler variable and the Lcode are combined in one model. No trend is incorporated in the model. According to Journel and Rossi 1989 [6], kriging with a trend is mainly of importance when extrapolation is sought. Since this is not the case here, no trend is used. However anisotropy is incorporated in the model.

draft comment: adding a trend seem to add an odd spiking behavior that i currently don’t understand. Hence, trend is omitted.

#populate gstat with num-vars
{gs_cs=NULL
for(j in vars_l){
  frm = as.formula(paste(j,"1", sep="~"))
  gs_cs = gstat(gs_cs, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5)
}
}
#add cat. levels
gs_cs <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,g = gs_cs, nmin = 5) %>% 
  gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>% 
  gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>% 
  gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5) 

7.1 Variography

The modeling procedure is as follows: First, the empirical variograms are evaluated. Then, using the varioApplet provided during the short course sessions, a preliminary fit with regards to anisotropy is fit in such a way that it provides a sufficient fit for the single elements. Here, the anisotropy values as follows where extracted:

  • Azimut of strongest continuity: 160°
  • Anisotropy ratio: 0,56

With these values, the autofit function is used to fit the remaining parameters. For the fitting, a spherical model is applied.

The following variogram shows the empirical variogram with the fitted anisotropic model for Ni_log. The azimuth was varied in 30° steps. The horizontal angular tolerance equals 15° (default). Subsequently, the exemplary variogram for Fe_log is shown.


# empirical variogram
vg_cs <- variogram(gs_cs,width=30, cutoff=300, alpha=c(0:5)*30)

anis <- c(156,0.52) #anisotropy values from applet

#fit model
vg_cs_m = vgm(psill=0.9, range=300, nugget=1,anis= anis,   model="Sph")
gs_cs= gstat::fit.lmc(v=vg_cs, model=vg_cs_m, g=gs_cs, correct.diagonal = 1.0001)

#gs_cs$model <- model4

##draw selected plot with code from applet

    # number of directions
    ndirs = vg_cs$dir.hor %>% unique %>% length
    # color scale
    col_dirs = RColorBrewer::brewer.pal(ndirs,"Set1") 

    # plot
    
    #for which variable?
    showel <- "Ni_log"
    vg0 = vg_cs[vg_cs$id==paste(showel),]
    
    plot(gamma~dist, data=vg0, pch=21, cex=1.2,  
       bg=col_dirs[as.factor(dir.hor)],  
       xlim=range(0,vg0$dist), ylim=range(0, vg0$gamma))
    vg0[, c("dist", "gamma", "dir.hor")] %>% 
      split(as.factor(vg_cs$dir.hor)) %>% 
      sapply(lines, col="grey") 
## $`0`
## NULL
## 
## $`30`
## NULL
## 
## $`60`
## NULL
## 
## $`90`
## NULL
## 
## $`120`
## NULL
## 
## $`150`
## NULL
    
    
      myfun = function(x){
      lines(x[,c(1,2)], col=col_dirs[x$dir.hor/30+1], lty=2)
    }
    vg0[,c("dist", "gamma", "dir.hor")] %>% 
      split(as.factor(vg0$dir.hor)) %>% sapply(myfun)
## $`0`
## NULL
## 
## $`30`
## NULL
## 
## $`60`
## NULL
## 
## $`90`
## NULL
## 
## $`120`
## NULL
## 
## $`150`
## NULL
       
     for(i in 1:6){
      angle = pi/2 - ((i-1)*30)*pi/180
      direction = c(sin(angle), cos(angle) ,0)
      variodens = variogramLine(gs_cs$model[[paste(showel)]], maxdist= 1.1*max(vg_cs$dist), dir=direction)
      lines(variodens, col=col_dirs[i], lwd=2)
     }
    legend(x = "bottomright", legend = vg_cs$dir.hor %>% unique %>% paste(., "°") , fill=col_dirs )

It can be seen, that the fit is not perfect but the general features of the variography are taken. The anisotropy angle varies slightly for the different elements, as such a deviation can be seen e.g. for the shown Ni_log. With the fitted semivariance models, the actual cokriging can be executed.

7.2 Results


cok_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6 ) %>% as.data.table()
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  1% done
 13% done
 20% done
 23% done
 27% done
 29% done
 32% done
 36% done
 39% done
 42% done
 45% done
 49% done
 53% done
 58% done
 63% done
 70% done
 79% done
100% done

#calculation of variance cutoff
var_cutoff <- cok_cs$Fe_log.var %>% quantile(0.90, na.rm = T)

#omitting the respective values
cok_cs <- cok_cs[Fe_log.var < var_cutoff] 



#predictions for evaluation set
cok_cs_val <- predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  7% done
100% done
cok_cs_val$Lcode <- cok_cs_val %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>% 
   factor(x = .,levels = c(1,2,3,4),labels = Lcodes)

#calculating most propable Lcode from results
Lcodes <- c("FZ", "SA", "SM", "UM")
cok_cs$Lcode <- cok_cs %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>% 
   factor(x = .,levels = c(1,2,3,4),labels = Lcodes)


#kriging results of backtransformed values
cok_cs2 <- cok_cs %>% select(paste0(fkt_new,".pred")) %>% exp %>% cbind(cok_cs[,c("X", "Y")], .)

The figure below shows the results for the element Ni (backtransformed into normal space), the spatial distribution of the variance as well as the modeling results of Lcode.

a <- ggplot(data = cok_cs2, aes(x=X, y=Y, fill=cok_cs2$Ni_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=0.5)+
  labs(fill="Ni [%]", title = "Backtransformed Ni")


b <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Lcode))+
  geom_raster()+
  coord_fixed()+
  geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y,  fill=Lcode), size=0.7, shape=21)+
  labs(fill="Lcode", title = "Lcode")

c <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Ni_log.var))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=0.5)+
  labs(fill="Var", title = "Variance Ni_log")


grid.arrange(a,b,c, nrow=2)

The second plot reveals a problematic situation with Lcode. Since the sampling sites for SA are relatively sparsely distributed, the variogram could not not capture spacial dependence variance changes even at the minimum bin. This can be seen in the variogram for SA. As a result even in cells that are very close to SA-sampled drillholes, SA is not the “most probable” Lcode. This results in rather odd results for the validation of LCode as shown in the following:




CM_m <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = cok_cs_val$Lcode )

cvms::plot_confusion_matrix(CM_m$`Confusion Matrix`[[1]],
                      add_sums = TRUE,
                      add_normalized = F,
                      diag_percentages_only = T,add_row_percentages = F)

It can be seen that no validation data points where classified as UM and SA although 27 resp. 32 members exist in these classes. The sensitivity values lie at 60.67% for SM and 0.76% for FZ. Following from this, the overall accuracy lies at 60.2%. This means, that the accuracy for this model is marginally better - by 1.4%. This is remarkable given the total misclassification of the minor groups UM and SA and can be attributed to their relatively low contribution to the total sample amount.

8 Logratio-composition Cokriging VK3

With compositional modeling, the covariates are treated as parts of a whole that sum up to one. Meaning, when one variable decreases, one or several others have to increase in order to maintain the condition that the proportions of the composition sums up to 1. As such, they form a multidimensional simplex. To illustrate this, the ternary diagram for the abundant variables Fe, Mg and Al, as well as for the elements Co, Ni and Fe is shown below. The diagram scales were centered. As such, the sparse variables can visually be compared to the abundant Fe.



dtcomp0 <-  dt %>% select(c("Fe", "Mg", "Al")) %>% acomp
plot.acomp(dtcomp0,center = T)


dtcomp0 <-  dt %>% select(c("Co", "Ni", "Fe")) %>% acomp
plot.acomp(dtcomp0,center = T)

It shows for example that very high values of Mg correspond to lower to medium Al values. Also rising MG values up to a certain point go in line with a relatively stable ratio of Fe to Al.

8.1 Swath Plots for compositions

Subsequently, swath plots for the compositions are used to identify whether the necessity for a spatial trend for the compositions should be considered.

X-direction

dtt <- dt[subset=="training",]

#dtt <- dt

dtcomp = dtt %>% select(Al:Ni) %>% acomp # calculate compositions
 
X = dtt %>% select(X, Y) #coordinates

swath(dtcomp, loc=X$X)  # quite flat --> consant mean

In X-direction no larger trends are seen.

Y-direction


swath(dtcomp, loc=X$Y)

In Y-direction, local trends can be found. This is especially true for Mg-compositions at low Y-values, but for Co and Al to some extend. However, this does not apply for all variables. As such a constant mean is assumed.

8.2 Compositional variography

Shown below is the onmidirectional variography as well as variography taking into cooperation anisotropy. The omnidirectional variogram also shows the fitting results as red lines. The anisotropic variography is shown only for comparison. The modeling process uses the fitted omnidirectional variograms.

Omnidirectional Variograms

# define the gmSpatialModel object
gmv=NULL
gmv = make.gmCompositionalGaussianSpatialModel(
  data = dtcomp, # use 
  coords = X,   # spatial coordinates
  V = "alr",    # eventually, use alr logratios if needed
  formula = ~1, # without drift in Y-direction
  nmax = 20,
  maxdist = 300 # cokriging neighbourgood has max. 20 samples
)


# compute logratio variograms
lrvg = gmGeostats::logratioVariogram(gmv,maxdist=300, nbins=8)


# define a compositional linear model
lrmd = CompLinModCoReg(~nugget()+sph(200), comp=dtcomp)

# fit

lrmdf = gmGeostats::fit_lmc(v = lrvg, model=lrmd)
## iteration = 0
## Step:
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Parameter:
##  [1] 1.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
##  [6] 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
## [11] 0.000000e+00 0.000000e+00 2.775558e-17 0.000000e+00 1.000000e+00
## [16] 2.000000e+02 1.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
## [21] 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [26] 1.000000e+00 0.000000e+00 0.000000e+00 2.775558e-17 0.000000e+00
## [31] 1.000000e+00
## Function Value
## [1] 10892.98
## Gradient:
##  [1]  -108.1566934   297.6195246  3049.3000140  -368.2881452  -748.1822431
##  [6]  2766.6826682  -779.7636772   -15.0538308  -921.6368726 -6057.0644291
## [11] -1334.4514791   -78.8617508  -564.8668066  1160.2710983  2968.9535040
## [16]    -0.9362738   -77.5475237   216.6639115  2492.8785679  -331.2357185
## [21]  -602.9255437  2234.5386624  -647.1378165    -9.2069931  -751.7037157
## [26] -4987.0162420 -1109.4334859   -55.8407992  -460.2596364   958.6872002
## [31]  2427.4625648
## 
## iteration = 100
## Parameter:
##  [1]   1.17219339  -0.25966094   0.48602329  -0.14367491   0.07739779
##  [6]   0.66903912   0.67425965   0.17458887   1.66228026   0.30646005
## [11]   0.37959775   0.12006202  -0.03668350   0.34685736   0.24251699
## [16] 199.66959676   1.09091440  -0.14085006   0.33061637   0.26708699
## [21]   0.22782022   0.73858792   1.42869606   0.03246158   1.39129119
## [26]   0.94901673   0.47699898   0.15237199  -0.16420269  -0.05174749
## [31]  -0.15537817
## Function Value
## [1] 49.36508
## Gradient:
##  [1] -0.03537434 -0.26171605 -0.14342451  0.30567819  0.45941588 -0.43889453
##  [7] -0.12378330  0.38311761  0.97251943  0.30645681 -0.34013786  0.62887612
## [13] -0.49480740 -0.42698502 -0.57098003  0.80957257 -0.38255950 -0.07904357
## [19]  0.11215948 -0.14325279 -0.01733548  0.12452035  0.44839712  0.10580774
## [25] -0.69950010 -0.35525093  0.14353829 -0.18362595  0.56930216  1.44245009
## [31]  0.17761681
## 
## Iterationslimit überschritten. Algorithmus fehlgeschlagen.

# plot
plot(lrvg, lrvg=vgram2lrvgram(lrmdf)) 

Directional variograms

Shown below is a plot of the directional behavior of the variograms.

  # 
gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=8,
                  azimuth=(0:11)*30, azimuth.tol=60) %>% 
  image

It can be seen that in E-W direction, a generally lower variation for most logratio-pairs is observed. Hence anisotropy would be advised to use.

8.3 Draft: Fitting of Anisotropy

draft note: However, while the anisotropy can be fitted, it can not be plotted due to an unknown error. Hence a onmidirectional model is utilized since the quality of the fit can not be evaluated. The code for the fitting and plotting of an anisotropic model can be seen extended below for reference.


lrvga = gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=10,
                          azimuth=c(0,45,90,115), azimuth.tol=60)

colors <- c("blue", "red", "yellow", "purple")


lrmda = LMCAnisCompo(Z=dtcomp,azimuths=c(0,45,90,115), models=c("nugget", "sph"))

lrmdfa = gmGeostats::fit_lmc(v = lrvga,g=gmv, model=lrmda)


plot(lrvga, lrvg=vgram2lrvgram(lrmdfa))

8.4 Results Logratio Cokriging

After the fitting of the logratio variograms, the actual modeling commences. In the following, the kriging results for the backtransformed logratio compositions for three elements Al, Co and Ni are shown

gmv = make.gmCompositionalGaussianSpatialModel(
  data = dtcomp, # use vicaria-comp
  coords = X,   # spatial coordinates
  V = "alr",    # eventually, use alr logratios if needed
 # formula = ~1+Y, # !!!creates bug at interpolation
  formula = ~1, # consider drift in Easting direction
  nmax = 30, # cokriging neighbourgood has max. 20 samples
  maxdist = 300,
 omax = 6,
  model = as.LMCAnisCompo(lrmdf)  # necessary conversion
)

#logratio cokriging
cokrig_alr = predict(gmv, newdata=xy_grid, debug.level=-1)
## starting cokriging 
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
 27% done
 41% done
 60% done
100% done

# cutoff of maximum variance
var_cutoff <- cokrig_alr$alr1.var %>% quantile(0.9, na.rm = T)
cokrig_alr <- cokrig_alr %>% as.data.table()
cokrig_alr <- cokrig_alr[alr1.var < var_cutoff]

#backtransforming to original variables
cokrig_compo = gsi.gstatCokriging2compo(
  cokrig_alr, V="alr", orignames = colnames(dtcomp))

#transform into percent
cokrig_compo <- cokrig_compo %>% as.data.frame %>% multiply_by(100)

#model performance for comparison

cok_alr_val <- predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6)
## starting cokriging 
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  3% done
100% done
cok_alr_val = cbind(cok_alr_val[,1:2],
                    gsi.gstatCokriging2compo(
                      cok_alr_val, V="alr", orignames = colnames(dtcomp)) %>% 
                      as.data.frame %>% multiply_by(100)
)

Al

 
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Al))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Al [%]")

Co

 
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Co))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Co [%]")

Ni

 
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Ni))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]")

9 Cosimulations VS1 - VS3

Following, the cosimulations are shown. The cosimulations use the variographic parameters from their kriging counterparts. As such the results in this chapter are summarized. First the actual modeling is executed for the three models VS1 - VS3.

For this preliminary modeling the number of iterations is set to 10.


n=10
# simulation for cokriging with log-values
set.seed(123)
cosim = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  1% done
  5% done
  9% done
 12% done
 16% done
 20% done
 23% done
 27% done
 30% done
 34% done
 37% done
 41% done
 44% done
 48% done
 51% done
 54% done
 58% done
 61% done
 64% done
 68% done
 71% done
 74% done
 77% done
 81% done
 84% done
 87% done
 90% done
 93% done
 96% done
 99% done
100% done


# simulation for logratio compositions

cos_alr = predict(gmv, newdata=xy_grid, debug.level=-1,nmax=20, omax=6, nsim=n) 
## starting cokriging 
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  0% done
  6% done
 13% done
 19% done
 24% done
 30% done
 36% done
 41% done
 47% done
 52% done
 58% done
 63% done
 68% done
 73% done
 79% done
 84% done
 89% done
 94% done
 99% done
100% done
cos_alr1 <- cos_alr
cos_alr <- cos_alr1
# outcome analysis
class(cos_alr)  
## [1] "data.frame"
dim(cos_alr) 
## [1] 15759    52
colnames(cos_alr)
##  [1] "X"          "Y"          "alr1.sim1"  "alr1.sim2"  "alr1.sim3" 
##  [6] "alr1.sim4"  "alr1.sim5"  "alr1.sim6"  "alr1.sim7"  "alr1.sim8" 
## [11] "alr1.sim9"  "alr1.sim10" "alr2.sim1"  "alr2.sim2"  "alr2.sim3" 
## [16] "alr2.sim4"  "alr2.sim5"  "alr2.sim6"  "alr2.sim7"  "alr2.sim8" 
## [21] "alr2.sim9"  "alr2.sim10" "alr3.sim1"  "alr3.sim2"  "alr3.sim3" 
## [26] "alr3.sim4"  "alr3.sim5"  "alr3.sim6"  "alr3.sim7"  "alr3.sim8" 
## [31] "alr3.sim9"  "alr3.sim10" "alr4.sim1"  "alr4.sim2"  "alr4.sim3" 
## [36] "alr4.sim4"  "alr4.sim5"  "alr4.sim6"  "alr4.sim7"  "alr4.sim8" 
## [41] "alr4.sim9"  "alr4.sim10" "alr5.sim1"  "alr5.sim2"  "alr5.sim3" 
## [46] "alr5.sim4"  "alr5.sim5"  "alr5.sim6"  "alr5.sim7"  "alr5.sim8" 
## [51] "alr5.sim9"  "alr5.sim10"

#debugged version to backtransform results in one container
cos_alr <- cos_alr1
cos_alr <- cos_alr[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp <- list()
for (i in 1:n){
cos_comp[[i]] <- cos_alr %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}



# mixed model cosimulation

cos_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  0% done
  2% done
  3% done
  4% done
  5% done
  7% done
  8% done
  9% done
 10% done
 12% done
 13% done
 14% done
 16% done
 17% done
 18% done
 19% done
 20% done
 22% done
 23% done
 24% done
 25% done
 26% done
 28% done
 29% done
 30% done
 31% done
 32% done
 34% done
 35% done
 36% done
 37% done
 38% done
 40% done
 41% done
 42% done
 43% done
 44% done
 45% done
 46% done
 47% done
 49% done
 50% done
 51% done
 52% done
 53% done
 54% done
 55% done
 57% done
 58% done
 59% done
 60% done
 61% done
 62% done
 63% done
 64% done
 65% done
 67% done
 68% done
 69% done
 70% done
 71% done
 72% done
 73% done
 74% done
 75% done
 76% done
 78% done
 79% done
 80% done
 81% done
 82% done
 83% done
 84% done
 85% done
 86% done
 87% done
 88% done
 89% done
 91% done
 92% done
 93% done
 94% done
 95% done
 96% done
 97% done
 98% done
 99% done
100% done

#backup results for possibility of subsequent code changing and rerun

cosim.BU <- cosim
cos_comp.BU <- cos_comp
cos_cs.BU <- cos_cs

9.1 Exemplary Results

The subsequent plots show one iteration of the respective modeling procedure on the example of Ni (backtransformed). This only allows for a general qualitative overview of the methods.



b <- ggplot(data = cosim, aes(x=X, y=Y, fill=cosim$Ni_log.sim1 %>% exp()))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="VS1")

a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=cos_cs$Ni_log.sim1%>% exp()))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="VS2")


c <- ggplot(data = cos_comp[[3]] %>% as.data.frame(), aes(x=cosim$X, y=cosim$Y, fill=Ni*100))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="VS3")

grid.arrange(b, a,c, nrow=2)

It can be seen that from a qualitative point of view, the results are relatively similar. The distributions of the values however deviates from example to example. Especially in the S-W-area of the survey field, a varied distribution can be seen.

9.2 Mean Values

To allow a more summarizing assessment of the three methods, the mean values of the simulations are shown below.

# mean values of cosimulations

#reset containers from BU (not necessary, only for rerunning this chunk with varied options)
cosim.BU -> cosim
cos_comp.BU -> cos_comp
cos_cs.BU -> cos_cs

#cosimulation with logvalues VS1

for (i in fkt_new) {
    cosim[,paste0(i,".mean")] <-  cosim %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
    cosim[,paste0(i,".vc")] <-  cosim %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
    
    }

# mixed model VS2
for (i in fkt_new) {
    cos_cs[,paste0(i,".mean")] <-  cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector() 
    cos_cs[,paste0(i,".vc")] <-  cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}

# logratio cosimulation VS3

cos_comp1 <- array(as.numeric(unlist(cos_comp)), dim=c(nrow(cos_comp[[1]]),
                                               ncol(cos_comp[[1]]),
                                                    n))

cos_comp_sum <- data.table()
o <- 0
for (i in colnames.num) {
  o <- o+1
  cos_comp_sum[,paste0(i,".mean")] <- cos_comp1[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector()*100
  cos_comp_sum[,paste0(i,".vc")] <- cos_comp1[,o,]  %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()
}

cos_comp_sum <- cbind(cos_comp_sum, X=cosim$X, Y=cosim$Y)

a <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.mean)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="VS1")


b <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.mean)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="VS2")




c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.mean))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="VS3")

maxlegend <- c(a$data$Ni_log.mean, b$data$Ni_log.mean, c$data$Ni_log.mean) %>% max()

a <- a+scale_fill_gradientn(colors = c("blue2","purple", "red2"), limits=c(0,maxlegend))
b <- b+scale_fill_gradientn(colors = c("blue2","purple", "red2"), limits=c(0,maxlegend))
c <- c+scale_fill_gradientn(colors = c("blue2","purple", "red2"), limits=c(0,maxlegend))


grid.arrange(a,b,c, nrow=2)

To allow a comparison, the maximum of the color scale was chosen to be the maximum of all three plots ( 1.976 %).

It can be seen, that generally all three plot show similar results. In the S-W of the survey field, a structure with a higher mineralization can be found. Furthermore, the results for VS1 are generally higher in the peak areas, also the peak seems to take a larger area. The results of VS2 and VS3 show similar values - with VS3 showing lower values in the mineralized areas.

Variation coefficient

Following, the distribution of the variation coefficient as calculated from all simulated iterations is shown. This allows an assessment of areas where the variation is higher or lower. Also it allows for an assessment of differences in the general behavior of the different simulation techniques. In a practical mine planning application, it also allows to identify areas where element contents are expected to vary more. Such information could for example have an influence on stockpiling strategies.


b <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.vc)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="VC", title ="Cosimulation w. log-values")


a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.vc)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="VC", title ="Mixed model cosimulation w. log-values")




c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.vc))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="VC", title ="Compositional logratio cosimulation")


grid.arrange(b,a,c, nrow=2)

It can be seen that large differences appear between VS3 and the first two simulation methods. Where VS1 and 2 show areas with heightened variations, The logratio cosimulation shows a more homogeneous picture. Inside of the sampling area, the variation is clearly smaller than outside of the sampling area. VS1 and 2 show smaller patches of high variance on the boarders of the survey area. Furthermore, the influence of the anisotropy in VS3 can be seen. It alters the shapes of the highly varying areas to a more elliptic shape with their main axis following the anisotropy angle chosen before ( 156°)

10 Model performance - Cross validation

In order to evaluate the performance of the three different models, the validation part of the data set is used for cross validation. Here, the predicted values (backtransformed) are compared to the real values of the validation set. For these calculations, again 10 repetitions are used for the simulations.




# simulation for cokriging with log-values V1
set.seed(123)
cosim_val = predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
100% done



# mixed model cosimulation

cos_cs_val = predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
 33% done
 82% done
100% done


# simulation for logratio compositions

cos_alr_val = predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1,nmax=20, omax=6, nsim=n) 
## starting cokriging 
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
100% done

# outcome analysis
class(cos_alr)  
## [1] "data.frame"
dim(cos_alr) 
## [1] 15759    50
colnames(cos_alr)
##  [1] "alr1.sim1"  "alr1.sim2"  "alr1.sim3"  "alr1.sim4"  "alr1.sim5" 
##  [6] "alr1.sim6"  "alr1.sim7"  "alr1.sim8"  "alr1.sim9"  "alr1.sim10"
## [11] "alr2.sim1"  "alr2.sim2"  "alr2.sim3"  "alr2.sim4"  "alr2.sim5" 
## [16] "alr2.sim6"  "alr2.sim7"  "alr2.sim8"  "alr2.sim9"  "alr2.sim10"
## [21] "alr3.sim1"  "alr3.sim2"  "alr3.sim3"  "alr3.sim4"  "alr3.sim5" 
## [26] "alr3.sim6"  "alr3.sim7"  "alr3.sim8"  "alr3.sim9"  "alr3.sim10"
## [31] "alr4.sim1"  "alr4.sim2"  "alr4.sim3"  "alr4.sim4"  "alr4.sim5" 
## [36] "alr4.sim6"  "alr4.sim7"  "alr4.sim8"  "alr4.sim9"  "alr4.sim10"
## [41] "alr5.sim1"  "alr5.sim2"  "alr5.sim3"  "alr5.sim4"  "alr5.sim5" 
## [46] "alr5.sim6"  "alr5.sim7"  "alr5.sim8"  "alr5.sim9"  "alr5.sim10"

#debugged version to backtransform results in one container

cos_alr_val <- cos_alr_val[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp_val <- list()
for (i in 1:n){
cos_comp_val[[i]] <- cos_alr_val %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}


# mean values for validation

#cosimulation with logvalues

for (i in fkt_new) {
    cosim_val[,paste0(i,".mean")] <-  cosim_val %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
    cosim_val[,paste0(i,".vc")] <-  cosim_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
    
    }

# mixed model
for (i in fkt_new) {
    cos_cs_val[,paste0(i,".mean")] <-  cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector() 
    cos_cs_val[,paste0(i,".vc")] <-  cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}

# logratio cosimulation

cos_comp1_val <- array(as.numeric(unlist(cos_comp_val)), dim=c(nrow(cos_comp_val[[1]]),
                                               ncol(cos_comp_val[[1]]),
                                                    n))

cos_comp_sum_val <- data.table()
o <- 0
for (i in colnames.num) {
  o <- o+1
  cos_comp_sum_val[,paste0(i,".mean")] <- cos_comp1_val[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector() *100 # multiply by 100 to opbtain %
  cos_comp_sum_val[,paste0(i,".vc")] <- cos_comp1_val[,o,]  %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()  # multiply by 100 to opbtain %
}

cos_comp_sum_val <- cbind(cos_comp_sum_val,
                          X=dt[subset=="validation", c("X")],
                          Y=dt[subset=="validation", c("Y")])

10.1 Mean absolute error

The mean absolute error of the prediction is calculated.

#dataframe consisting of predicted and actual validation values

#for VK1 cokriging with log values 
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")) %>% exp,
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_l <- dt.kv_l %>% summarise(.,
                                 Name = "VK1",
                Al_MAE = mean(abs(Al_log-Al_log.pred)),
                Co_MAE = mean(abs(Co_log-Co_log.pred)),
                Fe_MAE = mean(abs(Fe_log-Fe_log.pred)),
                Filler_MAE = mean(abs(Filler_log-Filler_log.pred)),
                Mg_MAE = mean(abs(Mg_log-Mg_log.pred)),
                Ni_MAE = mean(abs(Ni_log-Ni_log.pred)))

#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred")) %>% exp,
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_cs <- dt.kv_cs %>% summarise(.,
                                   Name = "VK2",
                Al_MAE = mean(abs(Al_log-Al_log.pred)),
                Co_MAE = mean(abs(Co_log-Co_log.pred)),
                Fe_MAE = mean(abs(Fe_log-Fe_log.pred)),
                Filler_MAE = mean(abs(Filler_log-Filler_log.pred)),
                Mg_MAE = mean(abs(Mg_log-Mg_log.pred)),
                Ni_MAE = mean(abs(Ni_log-Ni_log.pred)))


# for VK3 loratio cokriging

dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_alr <- dt.alr %>% summarise(.,
                                  Name = "VK3",
                Al_MAE = mean(abs((Al_log)-(Al))),
                Co_MAE = mean(abs((Co_log)-(Co))),
                Fe_MAE = mean(abs((Fe_log)-(Fe))),
                Filler_MAE = mean(abs((Filler_log)-(Filler))),
                Mg_MAE = mean(abs((Mg_log)-(Mg))),
                Ni_MAE = mean(abs((Ni_log)-(Ni))))

#######for cosimulations######
#for VS1 cosimulation with log values 
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_cvl <- dt.cv_l %>% summarise(.,
                                 Name = "VS1",
                Al_MAE = mean(abs(Al_log-Al_log.mean)),
                Co_MAE = mean(abs(Co_log-Co_log.mean)),
                Fe_MAE = mean(abs(Fe_log-Fe_log.mean)),
                Filler_MAE = mean(abs(Filler_log-Filler_log.mean)),
                Mg_MAE = mean(abs(Mg_log-Mg_log.mean)),
                Ni_MAE = mean(abs(Ni_log-Ni_log.mean)))

#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_cv_cs <- dt.cv_cs %>% summarise(.,
                                   Name = "VS2",
                Al_MAE = mean(abs(Al_log-Al_log.mean)),
                Co_MAE = mean(abs(Co_log-Co_log.mean)),
                Fe_MAE = mean(abs(Fe_log-Fe_log.mean)),
                Filler_MAE = mean(abs(Filler_log-Filler_log.mean)),
                Mg_MAE = mean(abs(Mg_log-Mg_log.mean)),
                Ni_MAE = mean(abs(Ni_log-Ni_log.mean)))


# for VS3 logratio cosimulation

dt.cos_alr <- data.table(cos_comp_sum_val ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_cos_alr <- dt.cos_alr %>% summarise(.,
                                  Name = "VS3",
                Al_MAE = mean(abs(Al_log-Al.mean)),
                Co_MAE = mean(abs(Co_log-Co.mean)),
                Fe_MAE = mean(abs(Fe_log-Fe.mean)),
                Filler_MAE = mean(abs(Filler_log-Filler.mean)),
                Mg_MAE = mean(abs(Mg_log-Mg.mean)),
                Ni_MAE = mean(abs(Ni_log-Ni.mean)))



####Summary####
sum_val <- rbind(sumry_l, sumry_cs, sumry_alr,
                 sumry_cvl,sumry_cv_cs,sumry_cos_alr) %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_val
Name Al_MAE Co_MAE Fe_MAE Filler_MAE Mg_MAE Ni_MAE
VK1 2.202556 0.0447709 8.856579 7.082687 3.946235 0.3153578
VK2 2.270978 0.0443517 9.107742 7.221644 4.011018 0.3164846
VK3 2.201928 0.0445365 8.806227 8.022171 3.882312 0.3085519
VS1 2.644393 0.0549201 9.684120 7.184744 4.175863 0.3193899
VS2 2.632025 0.0559316 9.170857 7.292397 3.990590 0.3170055
VS3 6.447633 0.0540879 9.678661 8.847417 3.863677 0.3230897






# temp <- list()

# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
#   geom_point(size=0.5)+
#   geom_abline(intercept = 0)+
#   labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))

It shows that the variants generally show a similar error. Within this assignment, the variable Ni is of particular focus. Here, VS1 and VS2 show very similar low errors. VK3 shows the best overall performance.

10.2 Mean relative error

#dataframe consisting of predicted and actual validation values

#for VK1 cokriging with log values 
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")) %>% exp,
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_l <- dt.kv_l %>% summarise(.,
                                 Name = "VK1",
                Al_MRE = mean(abs(Al_log-Al_log.pred)/Al_log),
                Co_MRE = mean(abs(Co_log-Co_log.pred)/Co_log),
                Fe_MRE = mean(abs(Fe_log-Fe_log.pred)/Fe_log),
                Filler_MRE = mean(abs(Filler_log-Filler_log.pred)/Filler_log),
                Mg_MRE = mean(abs(Mg_log-Mg_log.pred)/Mg_log),
                Ni_MRE = mean(abs(Ni_log-Ni_log.pred)/Ni_log))

#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred"))%>% exp,
                      dt[subset=="validation"] %>% select(fkt_new)%>% exp)



sumry_cs <- dt.kv_cs %>% summarise(.,
                                   Name = "VK2",
                Al_MRE = mean(abs(Al_log-Al_log.pred)/Al_log),
                Co_MRE = mean(abs(Co_log-Co_log.pred)/Co_log),
                Fe_MRE = mean(abs(Fe_log-Fe_log.pred)/Fe_log),
                Filler_MRE = mean(abs(Filler_log-Filler_log.pred)/Filler_log),
                Mg_MRE = mean(abs(Mg_log-Mg_log.pred)/Mg_log),
                Ni_MRE = mean(abs(Ni_log-Ni_log.pred)/Ni_log))


# for VK3 logratio cokriging

dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_alr <- dt.alr %>% summarise(.,
                                  Name = "VK3",
                Al_MRE = mean(abs(Al_log-Al)/Al_log),
                Co_MRE = mean(abs(Co_log-Co)/Co_log),
                Fe_MRE = mean(abs(Fe_log-Fe)/Fe_log),
                Filler_MRE = mean(abs(Filler_log-Filler)/Filler_log),
                Mg_MRE = mean(abs(Mg_log-Mg)/Mg_log),
                Ni_MRE = mean(abs(Ni_log-Ni)/Ni_log))

#######for cosimulations######
#for VS1 cosimulation with log values 
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_cvl <- dt.cv_l %>% summarise(.,
                                 Name = "VS1",
                Al_MRE = mean(abs(Al_log-Al_log.mean)/Al_log),
                Co_MRE = mean(abs(Co_log-Co_log.mean)/Co_log),
                Fe_MRE = mean(abs(Fe_log-Fe_log.mean)/Fe_log),
                Filler_MRE = mean(abs(Filler_log-Filler_log.mean)/Filler_log),
                Mg_MRE = mean(abs(Mg_log-Mg_log.mean)/Mg_log),
                Ni_MRE = mean(abs(Ni_log-Ni_log.mean)/Ni_log))

#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp)



sumry_cv_cs <- dt.cv_cs %>% summarise(.,
                                   Name = "VS2",
                Al_MRE = mean(abs(Al_log-Al_log.mean)/Al_log),
                Co_MRE = mean(abs(Co_log-Co_log.mean)/Co_log),
                Fe_MRE = mean(abs(Fe_log-Fe_log.mean)/Fe_log),
                Filler_MRE = mean(abs(Filler_log-Filler_log.mean)/Filler_log),
                Mg_MRE = mean(abs(Mg_log-Mg_log.mean)/Mg_log),
                Ni_MRE = mean(abs(Ni_log-Ni_log.mean)/Ni_log))


# for VS3 logratio cosimulation

dt.cos_alr <- data.table(cos_comp_sum_val ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_cos_alr <- dt.cos_alr %>% summarise(.,
                                  Name = "VS3",
                Al_MRE = mean(abs(Al_log-Al.mean)/Al_log),
                Co_MRE = mean(abs(Co_log-Co.mean)/Co_log),
                Fe_MRE = mean(abs(Fe_log-Fe.mean)/Fe_log),
                Filler_MRE = mean(abs(Filler_log-Filler.mean)/Filler_log),
                Mg_MRE = mean(abs(Mg_log-Mg.mean)/Mg_log),
                Ni_MRE = mean(abs(Ni_log-Ni.mean)/Ni_log))



####Summary####
sum_val_MRE <- rbind(sumry_l, sumry_cs, sumry_alr,
                 sumry_cvl,sumry_cv_cs,sumry_cos_alr)

sum_val_MRE %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Name Al_MRE Co_MRE Fe_MRE Filler_MRE Mg_MRE Ni_MRE
VK1 0.8038649 1.032103 0.4929078 0.1135097 1.762340 0.6288471
VK2 0.8314625 1.021722 0.5062684 0.1159618 1.779700 0.6405719
VK3 0.8615388 1.103832 0.5315130 0.1367375 1.847911 0.6715783
VS1 1.1192446 1.872803 0.5595885 0.1154562 2.359084 0.6804091
VS2 1.2189431 2.145350 0.5609244 0.1175318 2.419242 0.6784959
VS3 3.1697966 1.963536 0.5921043 0.1392523 2.168711 0.6762911







# temp <- list()

# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
#   geom_point(size=0.5)+
#   geom_abline(intercept = 0)+
#   labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))

It can be seen, that the MRE is very high. Ranging from 0.11-0.14 for Filler up to 3.17 for Al for VS3. For the focus variable Ni, the best deviation was achieved by VS3. However, VS3 achieves bad results for Al. Because of this, it will not be used for the final resource estimation. The next constraint for the final estimation is the demand of the original task to assess the expectable variation of the resource estimation. As such a gaussian simulation has to be utilized. Hence, for the final estimation VS1 is utilized, which provides slightly smaller prediction errors than the more complex model VS2. Another reason not to use VS2 are the very long calculation times and frequent freezes that occurred while running VS2 with more than 10 iterations.

10.3 Mean deviation

The mean deviation gives an overview of the over- or underprediction of certain models. However, it must not be used as a single estimation tool. Since positive and negative errors equalize, it can be misleading and should be used in conjunction with other error estimation tools like the above ones.

#dataframe consisting of predicted and actual validation values

#for VK1 cokriging with log values 
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")) %>% exp(),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp())



sumry_l <- dt.kv_l %>% summarise(.,
                                 Name = "VK1",
                Al_MD = mean((Al_log-Al_log.pred)),
                Co_MD = mean((Co_log-exp(Co_log.pred))),
                Fe_MD = mean((Fe_log-Fe_log.pred)),
                Filler_MD = mean((Filler_log-Filler_log.pred)),
                Mg_MD = mean((Mg_log-Mg_log.pred)),
                Ni_MD = mean((Ni_log-Ni_log.pred)))

#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred")) %>% exp(),
                      dt[subset=="validation"] %>% select(fkt_new)%>% exp())



sumry_cs <- dt.kv_cs %>% summarise(.,
                                   Name = "VK2",
                Al_MD = mean((Al_log-Al_log.pred)),
                Co_MD = mean((Co_log-Co_log.pred)),
                Fe_MD = mean((Fe_log-Fe_log.pred)),
                Filler_MD = mean((Filler_log-Filler_log.pred)),
                Mg_MD = mean((Mg_log-Mg_log.pred)),
                Ni_MD = mean((Ni_log-Ni_log.pred)))


# for VK3 logratio cokriging

dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_alr <- dt.alr %>% summarise(.,
                                  Name = "VK3",
                Al_MD = mean(((Al_log)-(Al))),
                Co_MD = mean(((Co_log)-(Co))),
                Fe_MD = mean(((Fe_log)-(Fe))),
                Filler_MD = mean(((Filler_log)-(Filler))),
                Mg_MD = mean(((Mg_log)-(Mg))),
                Ni_MD = mean(((Ni_log)-(Ni))))

#######for cosimulations######
#for VS1 cosimulation with log values 
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp())



sumry_cvl <- dt.cv_l %>% summarise(.,
                                 Name = "VS1",
                Al_MD = mean((Al_log-(Al_log.mean))),
                Co_MD = mean((Co_log-(Co_log.mean))),
                Fe_MD = mean((Fe_log-(Fe_log.mean))),
                Filler_MD = mean((Filler_log-(Filler_log.mean))),
                Mg_MD = mean((Mg_log-(Mg_log.mean))),
                Ni_MD = mean((Ni_log-(Ni_log.mean))))

#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
                      dt[subset=="validation"] %>% select(fkt_new) %>% exp())



sumry_cv_cs <- dt.cv_cs %>% summarise(.,
                                   Name = "VS2",
                Al_MD = mean((Al_log-(Al_log.mean))),
                Co_MD = mean((Co_log-(Co_log.mean))),
                Fe_MD = mean((Fe_log-(Fe_log.mean))),
                Filler_MD = mean((Filler_log-(Filler_log.mean))),
                Mg_MD = mean((Mg_log-(Mg_log.mean))),
                Ni_MD = mean((Ni_log-(Ni_log.mean))))


# for VS3 logratio cosimulation

dt.cos_alr <- data.table(cos_comp_sum_val ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_cos_alr <- dt.cos_alr %>% summarise(.,
                                  Name = "VS3",
                Al_MD = mean(((Al_log)-(Al.mean))),
                Co_MD = mean(((Co_log)-(Co.mean))),
                Fe_MD = mean(((Fe_log)-(Fe.mean))),
                Filler_MD = mean(((Filler_log)-(Filler.mean))),
                Mg_MD = mean(((Mg_log)-(Mg.mean))),
                Ni_MD = mean(((Ni_log)-(Ni.mean))))



####Summary####
sum_val <- rbind(sumry_l, sumry_cs, sumry_alr,
                 sumry_cvl,sumry_cv_cs,sumry_cos_alr) %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_val
Name Al_MD Co_MD Fe_MD Filler_MD Mg_MD Ni_MD
VK1 0.5092826 -0.9774758 2.0527079 0.3578403 2.389288 0.1378451
VK2 0.8875703 0.0256881 2.8189492 0.4690732 3.069060 0.1606239
VK3 0.5904066 0.0234014 1.3840424 -4.9421026 2.823281 0.1209711
VS1 -0.5088598 -0.0009743 0.7968986 0.1361651 1.614972 0.1067408
VS2 -0.7425656 -0.0044831 0.8172421 0.2158945 2.470693 0.1238483
VS3 -5.7674948 0.0023425 1.3993417 1.7786226 2.456502 0.1306857






# temp <- list()

# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
#   geom_point(size=0.5)+
#   geom_abline(intercept = 0)+
#   labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))

The mean deviation of prediction to real value gives information whether the predictions generaelly tend to over- or underpredict. It shows that the kriging methods generally predict lower values except for VK3-Filler where the prediction is in average by ca. 5 weight-% higher. The cosimulations also predict lower values in most cases except for Al and Co. There, higher values are predicted. VS yields to lowest error in this category.

10.4 Prediction vs. Validation Scatterplot

To obtain additional detailed information, the predicted values are plotted against the real values form the prediction dataset. This allows to a qualitative insight into the model quality. This is done also done on the example of the variable under focus: Ni.


i <- "Ni"

a <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VK1", x="Pred. V.", y= "Real V.")



b <- ggplot(data=dt.kv_cs, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VK2", x="Pred. V.", y= "Real V.")


c <- ggplot(data=dt.alr, aes(x = get(paste0(i)), y=get(paste0(i, "_log"))))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VK3", x="Pred. V.", y= "Real V.")


d <- ggplot(data=dt.cv_l, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log"))%>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VS1", x="Pred. V.", y= "Real V.")


e <- ggplot(data=dt.cv_cs, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log")) %>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VS2", x="Pred. V.", y= "Real V.")



f <- ggplot(data=dt.cos_alr, aes(x = get(paste0(i, ".mean")), y= get(paste0(i, "_log"))))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VS3", x="Pred. V.", y= "Real V.")



 grid.arrange(a,b,c,d,e,f, nrow=2)

It can be seen that all models tend to underpredict to an extend. Here, the picture repeats itself. All methods underpredict for Ni. However, the methods VS1 and VS3 show the best fit from a qualitative point of view. Also, it can be seen that a large portion of the underprediction can be attributed to relatively few outliers that show very high results. Additionally, it also can be seen that VS2 tends to overpredict values at low levels. In general, none of the models can really predict scarcely appearing very high values. This can be put into context to two points:

  • the observed effects of problematic prediction of the scarce Lcode SA
  • the fact that SA houses tendentially the highest values for Ni

As such, it appears, that the method of splitting the dataset in a chessboard like pattern created very small structures containing high Ni contents that where not really be captured by the variography. However, for the final resource estimation, the complete dataset will be used. This might mediate this effect to some extend.

11 Final Resource Estimation

VS1 is used for the final estimation as it showed the best validation results for Ni from the group of gaussian simulation models and requires much less computation time/runs more reliable. The Lcode and the chemical variables are computed separately. First the chemical components are simulated.

#redraw extends (smae as above - just for dubugging)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)

#initiate container
{gs_l_f=NULL
for(j in vars_l){
  frm = as.formula(paste(j,1, sep="~"))
  print(frm)
  gs_l_f = gstat(gs_l_f, id=j, formula=frm, locations=~X+Y, data=dt, nmax = 20, omax=6, maxdist = 300)
}
}
## Al_log ~ 1
## Co_log ~ 1
## Fe_log ~ 1
## Filler_log ~ 1
## Mg_log ~ 1
## Ni_log ~ 1

# create variograms
vge_l_f = gstat::variogram(gs_l_f,width=35, cutoff= 300) 

#create initial varmodels
vgt_l_m = vgm(psill=0.5, range=120, nugget=1,  model="Sph", add.to=vgm(psill=0.8, range=300,   model="Gau"))

#fitting
gs_l_f = gstat::fit.lmc(v=vge_l_f, g = gs_l_f, model = vgt_l_m, correct.diagonal = 1.00001)

gs_l_f_pred = predict(gs_l_f, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6,nsim = 30) 
## drawing 30 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  0% done
  2% done
  4% done
  5% done
  7% done
  9% done
 10% done
 12% done
 13% done
 15% done
 17% done
 18% done
 20% done
 21% done
 23% done
 24% done
 26% done
 27% done
 29% done
 31% done
 32% done
 34% done
 35% done
 37% done
 38% done
 39% done
 41% done
 43% done
 44% done
 46% done
 47% done
 49% done
 50% done
 52% done
 53% done
 54% done
 56% done
 57% done
 59% done
 60% done
 62% done
 63% done
 65% done
 66% done
 68% done
 69% done
 70% done
 72% done
 73% done
 75% done
 76% done
 77% done
 79% done
 80% done
 81% done
 83% done
 84% done
 86% done
 87% done
 88% done
 90% done
 91% done
 93% done
 94% done
 95% done
 97% done
 98% done
 99% done
100% done



saveRDS(gs_l_f_pred, "prediction_final.RDS")

11.0.1 Lcode

Subsequently the LCode is simulated similar to the previously used Method IK. The anisotropic variograms for the individual rock types are used, and the Lcodes are simulated sequentially assuming spatial independence in between the variables.

#sequential - as IK kriging
gs_i_FZ <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200)

gs_i_SA <- gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200) 

gs_i_SM <- gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200) 
  
gs_i_UM <- gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200) 

Lcodes <- c("FZ", "SA", "SM", "UM")
#initiate the variograms as loop for each LCode
for (i in c("FZ", "SA", "SM", "UM")){
  assign(paste0("vg_i_", i), 
         variogram(get(paste0("gs_i_", i)),
                   width=35, cutoff=200, alpha=c(0:5)*30 ))
}

#fit FZ
#plot(vg_i_FZ)
vgt_FZ  <- vgm(psill=0.1, model="Wav", range=120, nugget=0,anis = c(40,0.4), 
               add.to = vgm(psill=0.15, range=350,  nugget=0.1, model="Gau"))
  
 # plot(vg_i_FZ, model=vgt_FZ)

#  fit SA
#plot(vg_i_SA)
vgt_SA  <- vgm(psill=0.05, model="Wav", range=50, nugget=0.03,anis = c(0,0.4))
  
  #plot(vg_i_SA, model=vgt_SA)
  
#  fit SM
#  plot(vg_i_SM)
vgt_SM  <- vgm(psill=0.05, model="Wav", range=80, nugget=0.03,anis = c(0,0.4),
               add.to = vgm(psill=0.15, range=350,  nugget=0.1, model="Exp"))
  
  #plot(vg_i_SM, model=vgt_SM)

  #fit UM
#  plot(vg_i_UM)
vgt_UM  <- vgm(psill=0.05, model="Exp", range=150, nugget=0.02,anis = c(120,0.3),
               add.to = vgm(psill=0.01, range=50,  nugget=0, model="Wav",anis = c(30,0.3)))


 #update gstats
  
  for (i in Lcodes){
  assign(paste0("gs_i_", i),
         gstat(id=i,
               formula = (Lcode==paste0(i))~1,
               locations = ~X+Y,
               data=dt,
               model = get(paste0("vgt_",i)),
               nmax=30)
  )
}

  #kriging
  
  for (i in Lcodes){
  assign(paste0("sim_", i),
         predict(get(paste0("gs_i_", i)),
                     newdata=xy_grid, nsim = 30)
  )
}
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]

i="FZ"

#calculate mean simulation results
  for (i in Lcodes){
  assign(paste0("sim_", i, "_pred"),
         get(paste0("sim_", i)) %>% select(-c(1,2)) %>% apply(., MARGIN = 1, mean) %>% as.data.frame()
  )
  }


# combine estimations
sim_I_probs <- cbind(sim_FZ_pred, sim_SA_pred, sim_SM_pred, sim_UM_pred)
colnames(sim_I_probs) <- paste0(Lcodes, ".pred")

sim_I_probs$Lcode <- sim_I_probs %>% max.col() %>% 
   factor(x = .,levels = c(1,2,3,4),labels = Lcodes) 

sim_I_probs <- cbind(sim_I_probs, sim_SA[,1:2])

#Plotting the results

a <- ggplot(data = sim_I_probs, aes(x=X, y=Y, fill=sim_I_probs$Lcode))+
  geom_raster()+
  coord_fixed()+
  geom_point(data=dt, inherit.aes = F, aes(x=X, y=Y,  fill=Lcode), size=3, shape=21, alpha=0.7)+
  labs(fill="Lcode", title = "Lcode")

a

It can be seen, that the small patches for SA and UM do now appear as more “complete” units. This can be attributed to the fact that the sampling density now is roughly doubled.

# 
cos_cs_f_pred <- readRDS(file = "prediction_final.RDS")

# calculate mean values for chemicals
for (i in fkt_new) {
    cos_cs_f_pred[,paste0(i,".mean")] <-  cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector() 
    cos_cs_f_pred[,paste0(i,".vc")] <-  cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
   
}
#calculate sum LCode (not used right now)
# for (i in Lcodes) {
# cos_cs_f_pred[,(paste0(i,".sum"))] <-
#   cos_cs_f_pred %>%
#   select(contains(paste0(i, ".sim"))) %>% apply(.,1, median) %>% as.vector()
# }
# cos_cs_f_pred$Lcode.pred <- cos_cs_f_pred %>%
#   select(contains(".sum")) %>% max.col() %>% factor(x = .,levels = c(1,2,3,4),labels = Lcodes)

Interactive Plot Ni and Sample sites LCode

plotly::plot_ly(title="Ni [%]") %>% 
  add_rasterly_heatmap(cos_cs_f_pred,
                       x= cos_cs_f_pred$X,
                       y= cos_cs_f_pred$Y,
                       z=cos_cs_f_pred$Ni_log.mean,
                       colorscale='Viridis') %>% 
 plotly::add_markers(data = dt, x=~X, y= ~Y, color= dt$Lcode,size=1,
                     marker=list(sizeref=10)) %>%
  plotly::layout(title ="Ni [%] and Lcode sampling points",autosize = F )

It can be seen that the model tends to underfit the peak behavior of certain points (can be seen well zooming in). This behavior already was shown during the validation of the dataset.

11.1 Cumulative distribution


#preparation for comulative distribution

cos_cs_f_pred %<>% as.data.table()
#melting dt
dt.m <- melt.data.table(cos_cs_f_pred %>% select(contains(".sim")) %>% 
                          select(contains(fkt_new))) %>% as.data.table()
dt.m$value <- dt.m$value %>% exp()

#defining ´max an Q999
Mg_max <- dt.m[variable %like% ("Mg")]$value %>% max() %>% round()
Mg_q999 <- dt.m[variable %like% ("Mg")]$value %>% quantile(.,0.999) %>% round()

In the following, the cumulative grade distribution for the individual elements is shown. The values are truncated at the 0.999-Quantile. This is because with the simulation method chosen, in rare occasions very high values are given for a grid node. For example, the maximum for Mg lies at 951 %. This is clearly not possible but can be attributed to the fact that a noncompositional method is used here. The 0.999-Quantile however then lies at 92 %. The vertical lines in the plots indicate the 0.999-Quantile.



i="Al_log"
temp <- list()
o=0
for (i in fkt_new) {
o=o+1
  temp[[i]] <- 
  ggplot(dt.m[variable %like% (i)], aes(x = value, group = variable)) +
stat_ecdf(pad=F)+
    geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% max())+
    geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% quantile(.,0.999))+
  labs(title=fkt[[o]], x = "content [%]")+
    xlim(0,dt.m[variable %like% (i)]$value %>% quantile(.,0.999))
}

do.call("grid.arrange", c(temp, ncol=3))

11.2 Grade-tonnage Curves

To calculate the Grade-Cutoff-Plots, a custom function as seen in the code is used:



# function to calculate grade-cutoff-table
GraTon.Tab <- function(use, max.quantile=0.999, nsteps = 100, fromzero=F) {
  if (fromzero==T) qm <- 0 # start form 0?
  if (fromzero==F) qm <- use %>% min() #or from minimum?
  Q <- use %>% quantile(.,max.quantile) # define max use quantile?
  steps <- seq(from=qm, to=Q, length.out=nsteps) # number of steps
  gt <- data.table(cutoff=steps) #basic table
  gt$n <- lapply(steps, function(x)(length(which(use>x )))) %>% unlist() # Number of cells above cutoff
  gt$prop <- gt$n/max(gt$n) #proportion of n
  gt$mean <- lapply(steps, function(x)(mean(use[use>x] ))) %>% unlist() # mean above cutoff
  gt
}

The grade tonnage curves on the x-axis show the increasing cutoff while the the blue line shows the amount of total tiles above the respective cutoff. In the 3-D-case, this could be calculated to tonnage above cutoff. The brown lines show the mean grade that the remaining ore above the given cutoff would then have. These numbers can be used for a quick assessment of a given deposit’s feasibility. Also the results from these graphs can be used to pre-estimate revenue for the deposit depending on the chosen cutoff (As ore content can be recalculated into monetary value). However, for a detailed financial feasibility assessment, more factors like like variable mining costs, processing efficiency, interest, geometrical constraints need to be incorporated. Then for an open pit, an automated feasibility assessment can be done (e.g. Lerch-Crossman or Pseudoflow pit optimization). This can be combined with the gaussian simulation to achieve information about the financial uncertainty of a given project.

#create individual cutoff tables for all iterations and merge them into one table

temp <- list()
CO_all <- list()
coeff <- NULL
i="Al"
for (i in fkt) {
  #apply to extract all iterations of given element and calculate single cutoff-table
  CO_all[[i]] <- lapply(dt.m[variable %like% (i)]$variable %>% unique(),
       function(x) (GraTon.Tab(use = dt.m[variable == (x)]$value,max.quantile = 0.99))) %>% 
  rbindlist(l = .,idcol = "Nr") #coerce individual tables into one
   
  
  #coefficient for secondary axis transformation
 coeff[paste(i)] <- max(CO_all[[i]]$mean) 
 CO_all[[i]]$mean <- CO_all[[i]]$mean/coeff[paste(i)]
 #formula for secondary axis transformation
frm = as.formula(paste0("~.*",coeff[paste(i)]))

  temp[[i]] <- ggplot(CO_all[[i]], aes(x=cutoff, group = Nr)) +
    geom_line(aes( y= prop), color="blue3")+
    geom_line(aes(y= mean), color = "Orange3")+
  scale_y_continuous(name= "Proportion above cut-off", 
                     sec.axis = sec_axis(trans=frm, name="mean grade [%]"))+
    labs(title = i)+
    theme( axis.text.y.right = element_text(color = "orange3"),
           axis.text.y.left = element_text(color = "blue3"))
  
  }
do.call("grid.arrange", c(temp, ncol=3))

With the comuted 30 iterations, it can be seen that the mean grade of Ni follows a relatively tight band ranging from 0.5 % to roughly 2 % mean grade. When at least 50% of the deposit’s tiles shall be mined, a cutoff of 0.4 % can be chosen. In this case the average grade would equal 0.7 %. When a mean grade of at least 1% shall be achieved due to financial restrictions, between 25% and 18% of the deposit would remain for mining. The thickness of the blue curve gives an indicator of the uncertainty of tiles above a certain cutoff.

12 Summary

In this report, multiple geostatistical interpolation and modeling Methods have been applied to investigate their for a resource estimation problem. Three kriging methods and three gaussian simulation techniques where used to estimate element contents of a composition of elements. Generally, all methods showed high prediction errors. This might be attributed to the fact that the dataset comprises of a 2-D slice of a 3-D structure. As such valuable information about the spatial continuity are not present. However, the kriging methods in general achieved slightly less prediction errors.

What additionally stands out is that the more complex methods did not necessarily perform better with the validation data. The omnidirectional logratio compositional modeling performed slightly better than the mixed cokriging model taking into consideration anisotropy. Best however performed the general omnidirectional model not taking the actual rock type into account. This is surprising because the exploratory data analysis clearly pointed towards an significant influence of the rock type to the element composition. It however also illustrates the effect, that every additional variable incorporated within the model generally increases the uncertainty.

For the estimation of resource variability , gaussian simulation methods are necessary. Because of this, one gaussian model was chosen for a final resource estimation. This was VS1 due to its performance advantages. VS1 is gaussian, omnidirectional cosimulation with log-transformed values.

The variability of the resource estimation could be shown with the help of cumulative distribution graphs and grade-tonnage-graphs. depending on the desired/necessary cutoff, variation in the minable resources of e.g. 7% at a cutoff of 0.5% Ni have to be taken into account. Mining 7% Ore more or less is quite a considerable variation over the life of a mine where optimization in mining efficiency of few percent can have great monetary effects. As such, a gaussian simulation is capable of estimating financial risk associated with a given mining operation to a much greater extend than kriging methods that give no range of resource uncertainty. However, kriging methods return the best fit for the given situation. This also could be shown by the lesser prediction errors of those models.

13 Sources

[1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9

[2] - Makowski, D.; Ben-Shachar, M.; Patil, I.; Lüdecke, D. (2020): Methods and Algorithms for Correlation Analysis in R. In: JOSS 5 (51), S. 2306. DOI: 10.21105/joss.02306.

[3] - Tolosana-Delgado, R. and Menzel P., Block Course: Practical Geostatistics, TU Bergakademie Freiberg, 2021

[4] - Friendly, M.; Working with categorical data with R and the vcd and vcdExtra packages, York University, Toronto, 2021. https://cran.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial.pdf

[5] - Olsen, L.R.; The available metrics in cvms https://cran.r-project.org/web/packages/cvms/vignettes/available_metrics.html#multinomial-metrics

[6] - Journel, A.G., Rossi, M.E. When do we need a trend model in kriging?. Math Geol 21, 715–739 (1989). https://doi.org/10.1007/BF00893318

14 Annex (unused code)


# used for cosimulation(not used right now)
# a <- ggplot(data = cos_cs_f_pred, aes(x=X, y=Y, fill=(cos_cs_f_pred$Lcode.pred %>% as.character())))+
#   geom_raster()+
#   coord_fixed()+
# geom_point(data=dt, inherit.aes = F, aes(x=X, y=Y,  fill=Lcode %>% as.character()), size=3, shape=21, alpha=0.5)+
#   labs(fill="Lcode", title = "Lcode")
# 
# a